Fujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Teraa
نویسندگان
چکیده
1 Abstract In our rst participation in TREC, our focus was on improving the basic ranking systems and applying text clustering techniques for query expansion. We tested a variety of techiniques including reference measures, passage retrieval, and data fusion for the basic ranking systems. Some te-chiniques were used in the oocial run, others were not used because of time limitations. We applied the text clustering techiniques for query expansion with a text clustering engine. Clustering base query expansion uses the top N best text clusters from the top 1000 documents instead of just using the top N documents. Clustering base query expansion produces better results than simple query expansion based on passage retrieval. We submitted three runs, Flab7at , Flab7ad, and Flab7atE. Flab7at is combination of ranking and query expansion by clustering the top 1000 documents on the title eld, Flab7ad is combination of ranking and query expansion by clustering on the description eld, and Flab7atE is combination of ranking with Boolean (existence) operators and query expansion by passage retrieval. We particpated in TREC with two groups. One group was concerned with search engines, and included index construction, searching process, and normal query expansion. The other group was concerned with the Environment for Document Analysis (EDA), which is related to query expansion with text clustering. The two groups had diierent locations, and the two systems were developped in completely diierent environments. To combine the two systems , we constructed an experimental search procedure using perl script. We also wrote a TREC local procedure in perl script for such tasks as removing stop patterns from input query. Teraaa1],,2] is fulltext search system, designed to provide an adequate number of eecient functions for commercial service, and to provide parameter combination testing and easy extension for experiment of information retrieval. To satisfy both the commercial and experimental requirements, Teraa has many functions and extensibility as described below. 1. Basic search operations Boolean, Boolean Ranking (Ranking with Boolean operators for commercial use), Ranking (Accumulator Method) Near operators for phrase, Not operator, existing operator for term in ranking and parameter control, such as df limiter for ranking, term evaluation order control for ranking, etc. 2. Index type Inverted le index for fulltext search, number array index for range search, number array index for multiple occurences of number in a single text (eg. IPC code in a patent text) , and B-Tree index for item search. …
منابع مشابه
Fujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Tera
In our rst participation in TREC, our focus was on improving the basic ranking systems and applying text clustering techniques for query expansion. We tested a variety of techiniques including reference measures, passage retrieval, and data fusion for the basic ranking systems. Some techiniques were used in the o cial run, others were not used because of time limitations. We applied the text cl...
متن کاملFujitsu Laboratories Trec9 Report 1 System Description 2 Common Processing 2.1 Indexing/query Processing 2.1.1 Indexing Vocabulary 2.1.2 Stemmer 2.1.4 Stop Word List for Query Processing
This year a Fujitsu Laboratory team participated in web tracks. For TREC9 we experimented passage retrieval which is expected to be e ective for Web pages which contain more than one topic. To split document into passages, we used NLP based paragrah detecting program, not by xed (variable) window size. But it did not produce better result for TREC9 Web data. For indexing large web data faster, ...
متن کاملFujitsu Laboratories Trec8 Report 1 System Description 1.0.1 Tera 2 Common Processing
This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization,...
متن کاملCEAX’s Learning Support System to Explore Cultural Heritage Objects without Keyword Search
Taizo Yamada, Kenro Aihara, Noriko Kando, Satoko Fujisawa, Yusuke Uehara, Takayuki Baba, Shigemi Nagata, Takashi Tojo, Yuko Hiroshima and Jun Adachi 1 National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 2 Dept. of Informatics, the Graduate University for Advanced Studies, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 3 Fujitsu Laboratories Ltd., 1-1 Kamikodanaka 4, Na...
متن کاملFujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track
This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization,...
متن کامل